poisson (Poisson distribution)#

The Poisson distribution models the number of events that occur in a fixed exposure (time, area, volume, etc.) when events happen independently at a constant average rate.

This notebook uses the same parameterization as scipy.stats.poisson:

  • mu (often written (\lambda)) = expected count in the exposure, (\mu \ge 0)

Learning goals#

By the end you should be able to:

  • recognize when a Poisson model is appropriate (and when it isn’t)

  • write down the PMF/CDF and key properties

  • derive the mean, variance, and likelihood / MLE

  • implement Poisson sampling using NumPy only

  • visualize PMF/CDF and validate with Monte Carlo simulation

  • use scipy.stats.poisson for computation and basic estimation workflows

Table of contents#

  1. Title & Classification

  2. Intuition & Motivation

  3. Formal Definition

  4. Moments & Properties

  5. Parameter Interpretation

  6. Derivations

  7. Sampling & Simulation

  8. Visualization

  9. SciPy Integration

  10. Statistical Use Cases

  11. Pitfalls

  12. Summary

import math

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=6, suppress=True)

rng = np.random.default_rng(42)

1) Title & Classification#

Name: poisson (Poisson distribution)
Type: Discrete
Support: (k \in {0, 1, 2, \dots})
Parameter space: (\lambda \in [0, \infty)) (called mu in SciPy)

A useful interpretation is (\lambda = r \times \text{exposure}), where (r) is an event rate per unit exposure (e.g., “calls per hour”).

2) Intuition & Motivation#

What this distribution models#

A Poisson random variable (X) counts how many events occur in a fixed exposure when:

  • events occur one-at-a-time,

  • the event rate is approximately constant over the exposure,

  • counts in disjoint sub-intervals are independent (independent increments).

A canonical construction is the Poisson process: if events arrive at rate (r) per unit time, then the number of arrivals in an interval of length (T) is [ X \sim \text{Poisson}(\lambda), \qquad \lambda = rT. ]

Typical real-world use cases#

  • call arrivals to a helpdesk per minute

  • defects per meter of manufactured material

  • photons counted by a sensor in a fixed time window

  • insurance claims per policy-year

  • mutations in a stretch of DNA

  • web requests to an endpoint per second

In applied modeling, (\lambda) often depends on covariates (Poisson regression / GLMs) and on exposure (an offset).

Relations to other distributions#

  • Binomial limit (rare events): if (X_n \sim \text{Bin}(n, p)) with large (n), small (p), and (np \to \lambda), then (X_n \Rightarrow \text{Poisson}(\lambda)).

  • Exponential / Gamma waiting times: in a Poisson process with rate (r), inter-arrival times are (\text{Exp}(r)), and the waiting time to the (k)-th event is (\text{Gamma}(k, r)).

  • Additivity: if (X_i \sim \text{Poisson}(\lambda_i)) are independent, then (\sum_i X_i \sim \text{Poisson}(\sum_i \lambda_i)).

  • Thinning: if you keep each event independently with probability (p), the kept count is (\text{Poisson}(p\lambda)).

  • Gamma–Poisson mixture: if (\lambda) is random with a Gamma distribution, the marginal count is Negative Binomial (useful for over-dispersion).

  • Normal approximation: for large (\lambda), (\text{Poisson}(\lambda)) is close to (\mathcal{N}(\lambda, \lambda)) with a continuity correction.

3) Formal Definition#

PMF#

For (\lambda \ge 0) and (k \in {0,1,2,\dots}), the probability mass function (PMF) is [ \Pr(X = k \mid \lambda) = \frac{e^{-\lambda} \lambda^k}{k!}. ]

CDF#

The cumulative distribution function (CDF) is [ F(k;\lambda) = \Pr(X \le k) = e^{-\lambda} \sum_{j=0}^{k} \frac{\lambda^j}{j!}. ]

Using the upper incomplete gamma function (\Gamma(s, x)), we can write [ F(k;\lambda) = \frac{\Gamma(k+1, \lambda)}{\Gamma(k+1)} = Q(k+1, \lambda), ] where (Q) is the regularized upper incomplete gamma function (implemented in SciPy as scipy.special.gammaincc).

4) Moments & Properties#

Mean and variance#

[ \mathbb{E}[X] = \lambda, \qquad \mathrm{Var}(X) = \lambda. ] A key modeling implication is equidispersion: mean equals variance.

Skewness and kurtosis#

For (\lambda > 0): [ \text{skewness} = \frac{1}{\sqrt{\lambda}}, \qquad \text{excess kurtosis} = \frac{1}{\lambda} \quad(\text{kurtosis} = 3 + \tfrac{1}{\lambda}). ]

MGF and characteristic function#

The moment generating function (MGF) and characteristic function are [ M_X(t) = \mathbb{E}[e^{tX}] = \exp\big(\lambda(e^t - 1)\big), \qquad \varphi_X(t) = \mathbb{E}[e^{itX}] = \exp\big(\lambda(e^{it} - 1)\big). ]

Entropy#

There is no simple closed form for the Shannon entropy. One convenient expression (in nats) is [ H(X) = -\sum_{k=0}^{\infty} p(k),\log p(k), \qquad p(k)=\frac{e^{-\lambda}\lambda^k}{k!}. ] For large (\lambda), a useful approximation is [ H(X) \approx \tfrac{1}{2}\log(2\pi e\lambda) - \frac{1}{12\lambda} + \mathcal{O}(\lambda^{-2}). ]

Other useful properties#

  • Mode: (\lfloor\lambda\rfloor) is a mode; if (\lambda) is an integer, both (\lambda-1) and (\lambda) are modes.

  • Factorial moments: (\mathbb{E}[X(X-1)\cdots(X-m+1)] = \lambda^m).

  • Closure under sums: sums of independent Poisson variables remain Poisson.

def _validate_mu(mu):
    if isinstance(mu, bool):
        raise TypeError("mu must be a real number, not bool")
    mu = float(mu)
    if mu < 0:
        raise ValueError("mu must be >= 0")
    return mu


def poisson_pmf_array(mu, *, tail=1e-12, max_k=None):
    '''Return (ks, pmf) over k=0..K where CDF is ~1-tail.

    Notes:
    - This is a *truncated* representation of an infinite-support distribution.
    - It is accurate when the remaining tail mass beyond K is negligible.
    '''
    mu = _validate_mu(mu)

    if mu == 0.0:
        return np.array([0], dtype=int), np.array([1.0], dtype=float)

    if max_k is None:
        # Heuristic upper bound: mean + several std devs.
        max_k = int(np.ceil(mu + 12.0 * np.sqrt(mu + 1.0) + 10.0))

    pmf = []
    p0 = math.exp(-mu)  # underflows for extremely large mu
    pmf.append(p0)

    cdf = p0
    k = 0
    while cdf < 1.0 - tail and k < max_k:
        k += 1
        pmf.append(pmf[-1] * mu / k)
        cdf += pmf[-1]

    ks = np.arange(len(pmf), dtype=int)
    pmf = np.asarray(pmf, dtype=float)
    return ks, pmf


def poisson_logpmf(k, mu):
    '''Log PMF for Poisson(mu). Returns -inf outside support.'''
    mu = _validate_mu(mu)

    k_arr = np.asarray(k)
    out = np.full(k_arr.shape, -np.inf, dtype=float)

    k_int = k_arr.astype(int)
    valid = (k_arr == k_int) & (k_int >= 0)
    if not np.any(valid):
        return out

    if mu == 0.0:
        out[valid & (k_int == 0)] = 0.0
        return out

    kv = k_int[valid]

    # log(k!) via log-gamma: log(k!) = lgamma(k+1)
    log_fact = np.vectorize(lambda x: math.lgamma(x + 1.0), otypes=[float])(kv)

    out[valid] = -mu + kv * np.log(mu) - log_fact
    return out


def poisson_pmf(k, mu):
    return np.exp(poisson_logpmf(k, mu))


def poisson_moments(mu):
    mu = _validate_mu(mu)
    if mu == 0.0:
        return {
            "mean": 0.0,
            "var": 0.0,
            "skewness": float("nan"),
            "excess_kurtosis": float("nan"),
            "kurtosis": float("nan"),
        }

    return {
        "mean": mu,
        "var": mu,
        "skewness": 1.0 / math.sqrt(mu),
        "excess_kurtosis": 1.0 / mu,
        "kurtosis": 3.0 + 1.0 / mu,
    }


def poisson_entropy_trunc(mu, *, tail=1e-12):
    '''Approximate entropy (nats) by truncating the PMF.'''
    ks, pmf = poisson_pmf_array(mu, tail=tail)
    pmf = pmf[pmf > 0]
    return float(-(pmf * np.log(pmf)).sum())


def poisson_entropy_asymptotic(mu):
    '''Large-mu approximation for entropy (nats).'''
    mu = _validate_mu(mu)
    if mu == 0.0:
        return 0.0
    return float(0.5 * np.log(2.0 * np.pi * np.e * mu) - 1.0 / (12.0 * mu))
mu = 6.0
moments = poisson_moments(mu)
{
    **moments,
    "entropy_trunc_nats": poisson_entropy_trunc(mu),
    "entropy_asymptotic_nats": poisson_entropy_asymptotic(mu),
}
{'mean': 6.0,
 'var': 6.0,
 'skewness': 0.4082482904638631,
 'excess_kurtosis': 0.16666666666666666,
 'kurtosis': 3.1666666666666665,
 'entropy_trunc_nats': 2.299299563148505,
 'entropy_asymptotic_nats': 2.3009293789298115}
# Monte Carlo check (matches formulas up to sampling error)
mu = 8.0
samples = rng.poisson(lam=mu, size=200_000)

est_mean = samples.mean()
est_var = samples.var(ddof=0)

{
    "formula_mean": mu,
    "mc_mean": float(est_mean),
    "formula_var": mu,
    "mc_var": float(est_var),
}
{'formula_mean': 8.0,
 'mc_mean': 8.00111,
 'formula_var': 8.0,
 'mc_var': 7.9929587679}

5) Parameter Interpretation#

The single parameter (\lambda) (SciPy: mu) is both:

  • the mean number of events in the exposure

  • the variance of the count

If (\lambda = rT) comes from a Poisson process, then:

  • (r) is a rate (“events per unit exposure”)

  • (T) is the exposure (“how long / how big the window is”)

Shape changes#

  • Small (\lambda): most mass is at 0 and 1, strongly right-skewed.

  • Moderate (\lambda): the distribution spreads out; the mode moves right.

  • Large (\lambda): the distribution becomes approximately symmetric and close to (\mathcal{N}(\lambda,\lambda)).

mu_values = [0.5, 1.5, 4.0, 10.0]
mu_max = max(mu_values)

ks, _ = poisson_pmf_array(mu_max, tail=1e-12)

fig = go.Figure()
for mu in mu_values:
    fig.add_trace(
        go.Scatter(
            x=ks,
            y=poisson_pmf(ks, mu),
            mode="markers+lines",
            name=f"mu={mu}",
        )
    )

fig.update_layout(
    title="Poisson PMF for different mu",
    xaxis_title="k",
    yaxis_title="P(X=k)",
)
fig.show()

6) Derivations#

Expectation#

Starting from the PMF, [ \mathbb{E}[X] = \sum_{k=0}^\infty k,\frac{e^{-\lambda}\lambda^k}{k!}. ] Use the identity (k\lambda^k/k! = \lambda,\lambda^{k-1}/(k-1)!) to shift the sum: [ \mathbb{E}[X] = e^{-\lambda}\sum_{k=1}^\infty k\frac{\lambda^k}{k!} = \lambda e^{-\lambda}\sum_{k=1}^\infty \frac{\lambda^{k-1}}{(k-1)!} = \lambda e^{-\lambda}\sum_{j=0}^\infty \frac{\lambda^{j}}{j!} = \lambda. ]

Variance#

A standard route is via factorial moments: [ \mathbb{E}[X(X-1)] = \sum_{k=0}^\infty k(k-1),\frac{e^{-\lambda}\lambda^k}{k!}. ] Since (k(k-1)\lambda^k/k! = \lambda^2,\lambda^{k-2}/(k-2)!), the same shift gives (\mathbb{E}[X(X-1)] = \lambda^2). Then [ \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \big(\mathbb{E}[X(X-1)] + \mathbb{E}[X]\big) - \lambda^2 = (\lambda^2 + \lambda) - \lambda^2 = \lambda. ]

Likelihood (i.i.d. sample)#

If (x_1,\dots,x_n) are i.i.d. (\text{Poisson}(\lambda)), the likelihood is [ L(\lambda) = \prod_{i=1}^n \frac{e^{-\lambda}\lambda^{x_i}}{x_i!}. ] The log-likelihood is [ \ell(\lambda) = \sum_{i=1}^n \big(x_i\log\lambda - \lambda - \log(x_i!)\big). ] Differentiate and set to zero: [ \ell’(\lambda) = \frac{\sum_i x_i}{\lambda} - n = 0 \quad\Rightarrow\quad \hat\lambda_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^n x_i. ] So the MLE is the sample mean.

# Visualize the log-likelihood for lambda (single observation)
k_obs = 7
lam_grid = np.linspace(1e-6, 20.0, 600)

logL = k_obs * np.log(lam_grid) - lam_grid - math.lgamma(k_obs + 1)
lam_hat = k_obs

fig = go.Figure()
fig.add_trace(go.Scatter(x=lam_grid, y=logL, mode="lines", name="log-likelihood"))
fig.add_vline(
    x=lam_hat,
    line_dash="dash",
    line_color="black",
    annotation_text=f"MLE λ̂={lam_hat}",
)
fig.update_layout(title=f"Poisson log-likelihood (k={k_obs})", xaxis_title="λ", yaxis_title="ℓ(λ)")
fig.show()

7) Sampling & Simulation#

Below are two NumPy-only samplers that illustrate common ideas.

A) Knuth’s algorithm (product of uniforms)#

A Poisson process perspective: the number of events in time (T) with rate (r) is Poisson with (\lambda=rT). Knuth’s algorithm draws uniforms and multiplies them until the product drops below (e^{-\lambda}). The loop runs about (\lambda) iterations on average, so it is excellent for small (\lambda) and slow for large (\lambda).

B) Inverse CDF sampling (with truncated tail)#

If we can compute the CDF (F(k)) on (k=0,1,2,\dots,K) such that (F(K)\approx 1), then sampling is:

  1. draw (U\sim\text{Uniform}(0,1))

  2. return the smallest (k) such that (F(k)\ge U) (searchsorted)

This is exact up to the (tiny) omitted tail mass beyond (K).

def sample_poisson_knuth(mu, size=1, *, rng: np.random.Generator):
    mu = _validate_mu(mu)

    size = (size,) if isinstance(size, int) else tuple(size)
    out = np.empty(size, dtype=int)

    if mu == 0.0:
        out.fill(0)
        return out

    L = math.exp(-mu)

    for idx in np.ndindex(out.shape):
        k = 0
        p = 1.0
        while p > L:
            k += 1
            p *= rng.random()
        out[idx] = k - 1

    return out


def sample_poisson_inverse_cdf(mu, size=1, *, rng: np.random.Generator, tail=1e-12):
    mu = _validate_mu(mu)

    size = (size,) if isinstance(size, int) else tuple(size)
    if mu == 0.0:
        return np.zeros(size, dtype=int)

    ks, pmf = poisson_pmf_array(mu, tail=tail)
    cdf = np.cumsum(pmf)
    cdf[-1] = 1.0  # absorb the omitted tail mass

    u = rng.random(size)
    idx = np.searchsorted(cdf, u, side="left")
    return ks[idx]
mu = 5.0
size = 50_000

x_knuth = sample_poisson_knuth(mu, size=size, rng=rng)
x_inv = sample_poisson_inverse_cdf(mu, size=size, rng=rng)

{
    "knuth_mean": float(x_knuth.mean()),
    "inv_cdf_mean": float(x_inv.mean()),
    "theory_mean": mu,
    "knuth_var": float(x_knuth.var(ddof=0)),
    "inv_cdf_var": float(x_inv.var(ddof=0)),
    "theory_var": mu,
}
{'knuth_mean': 5.00228,
 'inv_cdf_mean': 5.00994,
 'theory_mean': 5.0,
 'knuth_var': 4.999994801600001,
 'inv_cdf_var': 4.968961196399998,
 'theory_var': 5.0}

8) Visualization#

We’ll visualize:

  • the PMF (k \mapsto \Pr(X=k))

  • the CDF (k \mapsto \Pr(X\le k))

  • a Monte Carlo histogram compared to the theoretical PMF

mu = 7.0
ks, pmf = poisson_pmf_array(mu, tail=1e-12)
cdf = np.cumsum(pmf)

fig_pmf = go.Figure()
fig_pmf.add_trace(go.Bar(x=ks, y=pmf, name="PMF"))
fig_pmf.update_layout(title=f"Poisson PMF (mu={mu})", xaxis_title="k", yaxis_title="P(X=k)")
fig_pmf.show()

fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=ks, y=cdf, mode="lines", line_shape="hv", name="CDF"))
fig_cdf.update_layout(title=f"Poisson CDF (mu={mu})", xaxis_title="k", yaxis_title="P(X≤k)")
fig_cdf.show()

# Monte Carlo vs PMF
mc = sample_poisson_inverse_cdf(mu, size=80_000, rng=rng)

hist = np.bincount(mc, minlength=len(ks)) / len(mc)

fig_mc = go.Figure()
fig_mc.add_trace(go.Bar(x=ks, y=hist[: len(ks)], name="MC histogram", opacity=0.6))
fig_mc.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="PMF"))
fig_mc.update_layout(
    title=f"Monte Carlo vs PMF (mu={mu})",
    xaxis_title="k",
    yaxis_title="probability",
)
fig_mc.show()

Approximations in action#

Two classic approximations are worth seeing:

  • Binomial (\to) Poisson when events are rare (large (n), small (p), (np\approx\lambda))

  • Normal (\approx) Poisson when (\lambda) is large

def binom_pmf(k, n, p):
    k_arr = np.asarray(k)
    k_int = k_arr.astype(int)

    out = np.zeros_like(k_arr, dtype=float)

    valid = (k_int == k_arr) & (k_int >= 0) & (k_int <= n)
    if not np.any(valid):
        return out

    kv = k_int[valid]

    log_coeff = (
        math.lgamma(n + 1)
        - np.vectorize(math.lgamma)(kv + 1)
        - np.vectorize(math.lgamma)(n - kv + 1)
    )
    log_pmf = log_coeff + kv * math.log(p) + (n - kv) * math.log1p(-p)

    out[valid] = np.exp(log_pmf)
    return out


# A) Binomial -> Poisson
n = 300
p = 0.02
lam = n * p

ks = np.arange(0, 25)

pmf_binom = binom_pmf(ks, n, p)
pmf_pois = poisson_pmf(ks, lam)

fig = go.Figure()
fig.add_trace(go.Scatter(x=ks, y=pmf_binom, mode="markers+lines", name=f"Bin(n={n}, p={p})"))
fig.add_trace(go.Scatter(x=ks, y=pmf_pois, mode="markers+lines", name=f"Poisson(mu=np={lam:.1f})"))
fig.update_layout(title="Rare-event limit: Binomial vs Poisson", xaxis_title="k", yaxis_title="PMF")
fig.show()


# B) Normal approximation (continuity-corrected)
lam = 40.0
ks, pmf = poisson_pmf_array(lam, tail=1e-12)

# Approximate P(X=k) ≈ Φ((k+0.5-λ)/sqrt(λ)) - Φ((k-0.5-λ)/sqrt(λ))
from math import erf, sqrt

def std_norm_cdf(z):
    return 0.5 * (1.0 + erf(z / sqrt(2.0)))

sigma = math.sqrt(lam)
normal_approx = np.array(
    [
        std_norm_cdf((k + 0.5 - lam) / sigma) - std_norm_cdf((k - 0.5 - lam) / sigma)
        for k in ks
    ],
    dtype=float,
)

fig = go.Figure()
fig.add_trace(go.Scatter(x=ks, y=pmf, mode="markers", name="Poisson PMF"))
fig.add_trace(go.Scatter(x=ks, y=normal_approx, mode="lines", name="Normal approx"))
fig.update_layout(title=f"Normal approximation (mu={lam})", xaxis_title="k", yaxis_title="probability")
fig.show()

9) SciPy Integration#

SciPy provides a full-featured implementation via scipy.stats.poisson.

Common methods:

  • poisson.pmf(k, mu) / poisson.logpmf(k, mu)

  • poisson.cdf(k, mu) / poisson.sf(k, mu)

  • poisson.rvs(mu, size=..., random_state=...)

  • scipy.stats.fit(poisson, data, bounds=..., method="mle") (generic fitting API for discrete/continuous distributions)

import scipy.stats as st
from scipy.stats import poisson

mu = 6.5
ks = np.arange(0, 25)

pmf_scipy = poisson.pmf(ks, mu)
cdf_scipy = poisson.cdf(ks, mu)
samples_scipy = poisson.rvs(mu, size=20_000, random_state=rng)

# For the canonical Poisson (loc=0), the MLE for mu is just the sample mean.
mu_mle = float(samples_scipy.mean())

# SciPy also provides a general-purpose fitter for (discrete or continuous) distributions.
# Because mu has domain [0, ∞), we supply a finite upper bound for numerical optimization.
fit_res = st.fit(
    poisson,
    samples_scipy,
    bounds={"mu": (0.0, max(1.0, mu_mle * 10.0)), "loc": (0.0, 0.0)},
    guess={"mu": mu_mle, "loc": 0.0},
    method="mle",
)

{
    "pmf_sum_over_range": float(pmf_scipy.sum()),
    "cdf_last": float(cdf_scipy[-1]),
    "sample_mean": float(samples_scipy.mean()),
    "theory_mean": mu,
    "mle_mu_hat_closed_form": mu_mle,
    "fit_mu_hat_scipy": float(fit_res.params.mu),
    "fit_loc_hat_scipy": float(fit_res.params.loc),
    "fit_success": bool(fit_res.success),
}
{'pmf_sum_over_range': 0.9999999729280461,
 'cdf_last': 0.9999999729280464,
 'sample_mean': 6.5089,
 'theory_mean': 6.5,
 'mle_mu_hat_closed_form': 6.5089,
 'fit_mu_hat_scipy': 6.508900029898882,
 'fit_loc_hat_scipy': 0.0,
 'fit_success': True}

10) Statistical Use Cases#

A) Hypothesis testing (rate / count)#

A common task: test whether an observed count is unusually high or low compared to a baseline rate.

If (X\sim\text{Poisson}(\lambda_0)) under the null, then an upper-tail p-value is [ \text{p-value} = \Pr(X \ge k_\text{obs} \mid \lambda_0) = 1 - F(k_\text{obs}-1; \lambda_0). ]

B) Bayesian modeling (Gamma–Poisson conjugacy)#

With a Gamma prior on (\lambda) and Poisson likelihood, the posterior is also Gamma. This is a workhorse model for counts.

C) Generative modeling#

Poisson counts appear in simulation pipelines (arrivals, defects, clicks). In conditional models, (\lambda) is linked to features via (\lambda_i = \exp(x_i^\top\beta)\times \text{exposure}_i).

# A) Hypothesis testing example: "are we seeing more events than usual?"
from scipy.stats import chi2

k_obs = 12
lambda0 = 5.0

p_upper = poisson.sf(k_obs - 1, lambda0)  # P(X >= k_obs)

# Exact (central) confidence interval for lambda using chi-square quantiles
alpha = 0.05

if k_obs == 0:
    ci_low = 0.0
else:
    ci_low = 0.5 * chi2.ppf(alpha / 2, 2 * k_obs)
ci_high = 0.5 * chi2.ppf(1 - alpha / 2, 2 * (k_obs + 1))

{
    "k_obs": k_obs,
    "lambda0": lambda0,
    "upper_tail_p_value": float(p_upper),
    "95%_CI_for_lambda": (float(ci_low), float(ci_high)),
}
{'k_obs': 12,
 'lambda0': 5.0,
 'upper_tail_p_value': 0.0054530919130093445,
 '95%_CI_for_lambda': (6.200575108722218, 20.96158504817696)}
# B) Bayesian modeling: Gamma prior + Poisson likelihood
from scipy.stats import gamma, nbinom

# Prior: lambda ~ Gamma(alpha0, rate=beta0)
alpha0, beta0 = 2.0, 1.0

# Data: n independent Poisson draws (unit exposure)
data = rng.poisson(lam=4.5, size=40)

alpha_post = alpha0 + data.sum()
beta_post = beta0 + len(data)

posterior_mean = alpha_post / beta_post
posterior_ci = gamma.ppf([0.025, 0.975], a=alpha_post, scale=1.0 / beta_post)

# Plot prior vs posterior
lam_grid = np.linspace(0, 12, 600)

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=lam_grid,
        y=gamma.pdf(lam_grid, a=alpha0, scale=1.0 / beta0),
        mode="lines",
        name=f"prior Gamma({alpha0},{beta0}) (rate)",
    )
)
fig.add_trace(
    go.Scatter(
        x=lam_grid,
        y=gamma.pdf(lam_grid, a=alpha_post, scale=1.0 / beta_post),
        mode="lines",
        name=f"posterior Gamma({alpha_post:.1f},{beta_post:.1f}) (rate)",
    )
)
fig.update_layout(title="Gamma–Poisson conjugacy", xaxis_title="lambda", yaxis_title="density")
fig.show()

# Posterior predictive for a future count X_new (unit exposure): Negative Binomial
# If lambda ~ Gamma(alpha_post, rate=beta_post) and X|lambda ~ Poisson(lambda), then
# X_new ~ NegBin(n=alpha_post, p=beta_post/(beta_post+1)) in SciPy's parameterization.

n = alpha_post
p = beta_post / (beta_post + 1.0)
ks = np.arange(0, 30)
pred_pmf = nbinom.pmf(ks, n, p)

fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pred_pmf, name="posterior predictive"))
fig.update_layout(title="Posterior predictive (Negative Binomial)", xaxis_title="k", yaxis_title="P(X_new=k)")
fig.show()

{
    "prior_mean": alpha0 / beta0,
    "posterior_mean": float(posterior_mean),
    "posterior_95%_CI": (float(posterior_ci[0]), float(posterior_ci[1])),
}
{'prior_mean': 2.0,
 'posterior_mean': 4.097560975609756,
 'posterior_95%_CI': (3.5013649093750003, 4.7399398087126885)}
# C) Generative modeling example: Poisson regression-style simulation

n = 300
x = rng.normal(size=n)
exposure = rng.uniform(0.5, 2.0, size=n)  # e.g. time at risk

beta0, beta1 = 1.0, 0.6

# log-link: lambda_i = exp(beta0 + beta1 * x_i) * exposure_i
lam = np.exp(beta0 + beta1 * x) * exposure

y = rng.poisson(lam=lam)

df = {
    "x": x,
    "exposure": exposure,
    "lambda": lam,
    "y": y,
}

fig = px.scatter(df, x="lambda", y="y", opacity=0.6)
fig.update_layout(title="Simulated counts vs true mean (lambda)", xaxis_title="lambda", yaxis_title="count y")
fig.show()

{
    "mean_lambda": float(lam.mean()),
    "mean_y": float(y.mean()),
    "var_y": float(y.var(ddof=0)),
}
{'mean_lambda': 3.76247509267191,
 'mean_y': 3.7933333333333334,
 'var_y': 10.923955555555557}

11) Pitfalls#

Invalid parameters#

  • (\lambda < 0) is invalid.

  • (\lambda = 0) is valid but degenerate: (\Pr(X=0)=1).

Numerical issues#

  • Direct PMF computation can underflow/overflow for large (k) or (\lambda). Prefer logpmf and stable special functions.

  • Summing the PMF naively to get the CDF may lose precision in the tails. Prefer scipy.stats.poisson.cdf/sf.

Modeling issues (the big ones)#

  • Over-dispersion: real count data often has variance (>) mean (heterogeneity, clustering). Consider Negative Binomial, quasi-Poisson, or hierarchical models.

  • Zero inflation: too many zeros vs Poisson; consider zero-inflated models.

  • Non-constant rate / dependence: if the rate changes over time or events cluster, the Poisson process assumptions fail.

  • Exposure matters: comparing counts without normalizing by exposure can be misleading.

12) Summary#

  • poisson(mu) is a discrete distribution on ({0,1,2,\dots}) with parameter (\mu=\lambda\ge 0).

  • PMF: (\Pr(X=k)=e^{-\lambda}\lambda^k/k!); CDF can be written via incomplete gamma functions.

  • Mean = variance = (\lambda); skewness (=1/\sqrt{\lambda}); excess kurtosis (=1/\lambda).

  • The likelihood yields (\hat\lambda=\bar x) for i.i.d. samples.

  • Useful in hypothesis tests for rates, in Bayesian models via Gamma–Poisson conjugacy, and in generative simulations for count data.